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Abstract 

We study sequential transmission of a stream of messages over a block-fading multi-input-multi-output (MIMO) 
channel. A new message arrives in each coherence block and the decoder is required to output each message after a 
delay of T coherence blocks. We establish the optimal diversity-multiplexing tradeoff (DMT) and show that it can be 

t— I ■ 

■ achieved using a simple interleaving of messages. The converse is based on an outage amplification technique which 

appears to be new. We also discuss another coding scheme based on a sequential tree-code. This coding scheme only 
,-0 , requires the knowledge of delay at the decoder and yet realizes the optimal DMT. We finally discuss some extensions 

rj^ when multiple messages at uniform intervals arrive within each coherence period. 

CO 

I. Introduction 

Many multimedia applications require real-time encoding of the source stream and a sequential reconstruction 
of each source frame by its playback deadline. Both the fundamental limits and optimal communication techniques 

O 

for such streaming systems can be very different from classical communication systems. In recent years there has 
been a growing interest in characterizing information theoretic limits for delay constrained communication over 
wireless channels. When the transmitter has channel state information (CSI), a notion of delay-limited capacity 
can be defined [2J. For slow fading channels, the delay-limited capacity is achieved using channel inversion at the 
transmitter |3). In absence of transmitter CSI, an outage capacity can be defined Q], 0. An alternative notion of 
expected capacity using a broadcast strategy has also been proposed J6) in such scenarios. Each fading state maps 
to a virtual receiver and a broadcast coding technique is used at the encoder. This approach has been further treated 
in e.g., ITl- flOl . For related work on joint source-channel coding over fading channels, see e.g., [11|-[21| and the 
references therein. 
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The present paper studies delay constrained streaming over multi-antenna wireless channels. We assume a block 
fading channel model and assume that the transmitter observes a sequence of independent messages, one in each 
coherence block. The encoded signal is a causal function of the messages. The decoder is required to output each 
message with a maximum delay of T coherence blocks. As our main contribution we characterize the diversity- 
multiplexing tradeoff of this delay-constrained streaming model and refer to it as streaming-DMT . 

Diversity-multiplexing tradeoff (DMT) of the quasi-static (slow-fading) channel model was first introduced in ||22l . 
The authors propose diversity order and multiplexing gain as two fundamental metrics for communication over a 
wireless channel, and establish a tradeoff between these. A significant body of literature on DMT for quasi-static 
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Fig. 1. Proposed Streaming Model. One new message arrives at the start of each coherence block. The message stream is encoded sequentially 
and each message needs to be output at the receiver after T coherence blocks. In the above figure T = 2. 

fading channels already exists, both in performance analysis and practical code constructions; see, e.g., Il23l - ll38l . 
All these works assume the transmission of a single message and do not consider the streaming setup. As we 
discuss in the sequel the quasi-static model falls out as a special case of our model when we set the delay T = 1. 

Delay-universal streaming has been studied in e.g., Il39l - ll45l . A tree-based code is proposed to encode a 
sequence of messages. The decoder is required to produce an estimate of each past message (at each time) under 
the constraint that the error probability decreases exponentially with the delay. Such a delay-universal (anytime) 
constraint is motivated by an application of stabilizing a control plant over a noisy channel. A maximum likelihood 
decoder is studied and the associated error exponent is characterized for a variety of discrete memoryless channels. 
In contrast the present work focuses on the streaming-DMT of block fading channels. We show that for a fixed 
decoding delay, the optimal DMT is achieved using a surprisingly simple interleaving technique, and does not 
require a tree code. We also study the performance achieved by a tree based encoder and a decision feedback based 
decoder and show that such a scheme can achieve the streaming-DMT in a delay universal manner. For related 
work on erasure channels we point the reader to e.g., [|46l - ll52l and references therein. 

II. Model 

Our setup is illustrated in Fig. Q] We consider an independent identically distributed (i.i.d.) block fading channel 
model with a coherence period of M: 

Y fc = H fc -X fc + Z fc , (1) 

where k = 0, 1, . . ., denotes the index of the coherence block of the fading channel. The matrix Hfe G i^N^xNt 
denotes the channel transfer matrix in coherence period k. We assume that the transmitter has N t transmit antennas 
and the receiver has N T receive antennas. 

X fc = pC fc (l) | ... | X k (M)]eC NtXM 

is a matrix whose j-th column, Xfc(j), denotes the vector transmitted in time-slot j in the coherence block k and 
similarly Yfe 6 C N r xM is a matrix whose j-th column, Yk(j) denotes the vectors received in time-slot j in block 
k. The additive noise matrix is G C rX . Thus (HJ can also be expressed as, 

YfcO') = H fc • X fc Cj) + ZfcO'), j = l,...,M. (2) 



We assume that all entries of are sampled independently from the complex Gaussian distributiorQ with zero- 
mean and unit-variance i.e., CJ\f(0, 1). The channel remains constant during each coherence block and is sampled 
independently across blocks. All entries of the additive noise matrix are also sampled i.i.d. CA/"(0, 1). Finally 
the realization of the channel matrices is revealed to the decoder, but not to the encoder. 

We assume an average (short-term) power constraint i?Ei=i ll^fe(*)l| 2 ] < Mp. Note that p denotes the transmit 
signal-to-noise-ratio (SNR). We will limit our analysis to the case where M is sufficiently large so that random 
coding arguments can be invoked within each coherence block. A delay-constrained streaming code is defined as 
follows: 

Definition 1 (Streaming Code): A rate R streaming code with delay T, C(R,T), consists of 

1. A sequence of messages {wk}k>a each distributed uniformly over the set Xm = {1,2,..., 2 MR }. 

2. A sequence of encoding functions Fu ■ I^/ 1 C NtXM , 

X fe = T k (wo,..., w k ), k = 0, 1,... (3) 

that maps the input message sequence to a codeword G C tX . 

3. A sequence of decoding functions Q k : £. M ( k + T ) — > % M that outputs message w k based on the first k + T 
observations, i.e., 

fifc = &(Y ,...,Y fc+r -i), fc = 0,l,... (4) 

We now define the diversity-multiplexing tradeoff (DMT) [22 1 associated with the streaming code C(R,T). Let 
the error probability for the fc-th message be pk = Pr(wk ^ Wk) where Wk is the decoder output and the 
error probability be averaged over the random channel gains. Let Pr(e) = sup fc>0 P/t denote the worst-case error 
probability^ The DMT tradeoff ll22l of (r, d) is achievable with delay T if there exists a sequence of codebooks 
C(R = r logp, T) such that 

d=]im -]0SPr(e) r=lim m. 

p^oo log p p^oo log p 

Of interest, is the optimal diversity-multiplexing tradeoff, denoted by dx(r). 

III. Main Result 

The optimal tradeoff between diversity and multiplexing (DMT) for the quasi-static fading channel was 
characterized in [22|. We reproduce the result below for the convenience of the reader. 
Theorem 1: ( ||22| ) For the quasi-static fading channel 

Y(i) = H ■ X(i) + Z(t) (5) 

'While we only focus on the Rayleigh channel model, our result easily extends to other channel models. 

2 We caution the reader that this is not the maximum error probability with respect to a single realization of the fading state sequence. This 
later quantity is clearly 1 as in any sufficiently long realization, we will eventually find at-least one block that leads to an outage. In our 
definition we fix an index, k and find the error probability p k averaged over the channel gains. We subsequently search for an index k with 
maximum error probability. For time-invariant coding schemes, due to symmetry, will be independent of k. 
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where the entries of H e C WrXjVt are sampled i.i.d. CAf(0, 1), the optimal DMT tradeoff di(r) is a piecewise linear 
function connecting the points (k, di(fc)) for k — 0,1, ... , min(iV r , N t ), where d\(k) — {N x — k)(N t — k). 

□ 

In our analysis the following simple generalization of the quasi-static DMT to L parallel channels is useful. 
Corollary 1: Consider a collection of L parallel quasi-static fading channel 

Y,(t) = H,.X l (t) + Z I (t), l = l,...,L (6) 

where the entries of H, G C N ' xNt are all sampled i.i.d. CJV(0, 1). The DMT tradeoff is given by d[(r) = L-d x (f ) 
for any r G (0,imin(iV r ,iVt)). 

□ 

We provide a proof of Corollary Q] in Appendix |A] 

Our main result establishes the optimal DMT for a block fading channel model with a delay constraint of T 
coherence blocks. 

Theorem 2: The optimal DMT tradeoff for a streaming code in Definition \T\ with a delay of T coherence blocks 
is given by dr(r) — T ■ d\(r), where d\{r) is the optimal DMT of the underlying quasi-static fading channel. 

□ 

The result in Theorem [2] illustrates that the DMT of a streaming source under a delay constraint of T coherence 
blocks is identical to the DMT of a system with T independent and parallel MIMO channels if the rate of the latter 
system is suitably normalized. Indeed our achievability scheme exploits this connection. We show that the DMT 
can be achieved by interleaving messages in a suitable manner to reduce the system to a parallel channel setup. We 
also present another scheme based on a tree code that achieves the DMT. The converse however does not appear 
to follow from earlier results. We present, what appears to be a new idea called "outage-amplification" which is 
specific to our streaming setup. 

In the remainder of this paper we present the converse in Section [IV] and two coding schemes for achieving the 
optimal DMT in Sections [V] and [VI] respectively. We discuss an extension to the case when two messages arrive 
within each coherence block in Section IVIII 

IV. Converse 

A. Discussion 

We first present simple bounds that suffice to bound the maximum diversity and maximum multiplexing gain, 
but are not tight in between. We then provide a heuristic explanation of the new dimension of our proof in tying 
together the bounding results for individual messages to take into account their overlapping transmission times. A 
formal proof of the converse is presented in section IIV-BI 

For our discussion we consider a single-antenna channel model when T = 2. Consider the decoding of wq from 
blocks and 1. One upper bound is obtained by revealing w\ to the decoder resulting in d$ (r) = 2 — r. Another 
upper bound is obtained by revealing w\ to the encoder at time t = 0, revealing W2 to the decoder and relaxing the 




Fig. 2. Simple bounds on the single-antenna model for T = 2. The actual DMT is d2(r) = 2(1 — r). Among the sequence of upper 
bounds d^(r), note that ^(r) is tight at the maximum diversity point while d£o(r) is tight at the maximum multiplexing point. However the 
intersection of these bounds is not tight enough to yield d,2(r). 



delay of Wq to T — 3. The setup is identical to a system with three parallel channels and two messages and hence 
df(r) — 3 — 2r. Generalizing this argument so that Wq, . . . , h/jv are revealed to the encoder at time t — 0, revealing 
i/i/jv+i to the decoder, and relaxing the delay constraints so that all the messages are decoded at the end of block 
N + 1, results in d~^(r) = (N + 2) — (N + 1) • r. These bounds are illustrated in Fig. [2] While the bound <ij(r) 
provides a tight bound on the diversity order 2 and d^ D (r) provides a tight bound on the maximum multiplexing 
gain, it can be easily verified that the intersection of these bounds does not yield a converse that matches the claim 
of Theorem [2] 

We illustrate the main idea of our proposed technique in Fig. [3] and Fig. |4] and provide some intuition below. 
Assume that a DMT larger than that claimed in Theorem [2] is achievable; assume that d(r) — 2(1 — r + 25) for 
some 8 > 0. This implies that Pr(iv/ C ^ Wk) < ( o~ 2 ( 1 ~ r + 2 ' 5 ) holds for each k > 0. Now suppose that gains h^ +1 
belongs to the set "Hk defined as: 



U k = \ (ho,...,h k+1 ) : |^o| 2 > 
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Fig. 3. Streaming setup when T = 2. The source data stream is 
observed sequentially and mapped into the channel stream. If we assume 
that d(r) > 2(1 — r) holds then message w^ can be recovered even if 
the channel gains in blocks k and k + 1 (shaded) are in outage. This 
is indicated in the upper figure with blocks k and k + 1 shaded. Our 
outage amplification argument amplifies this effect and shows that we 
can decode every data packet by its deadline even if all the channel gains 
are in outage, as illustrated in the lower figure. 



Fig. 4. Illustration of the outage amplification argument. Assume that 
the channel gains ho, ■ ■ . , h,4 are all in outage. For decoding wo only 
the values of ho and hi matter. The message wo thus can be decoded 
from blocks and 1. At this point we can reconstruct Xo as it only 
depends on wo and treat block as if it were not in outage. Then using 
blocks 1 and 2 we can decode wy, recover Xi. Repeating this procedure 
we can proceed to decode all the messages sequentially. 



From standard analysis ll22l we have|^ 

Pr(^ +1 € U k ) = p-^-r+S) (8) 
>Yi{w k ^w k ). (9) 

Thus the receiver cannot declare an outage event for the set H k - It turns out that given this fact, one can exploit 
the streaming nature of the problem to show successful decoding over a much weaker set of channels. For each 
N > 1 let: 

U% = j/i , • ■ • , h N : \h k \ 2 < p-^- r+s \ < k < Af} (10) 

denotes the set that all the N + 1 links are simultaneously in outage. We reason that the above decoder must in 
fact recover all the messages wq, Wi, . . . , w^-i by their respective deadlines. 

'Throughout we use the notation = to denote equality in the exponential sense. The function = flip) if limp— >oo = 1- We 

define < and > in a similar manner. 
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Consider any fig <E T~L%- Since /ij € Ho the decoder can recover n/o at the end of block 1. Upon recovering Wo, 
the decoder also recovers the transmitted codeword Xo(wo) and can replace the original channel output Yo with 
Yo = 1 • Xo + Zo. The effective sequence of channel gains h\ with ho = 1 and h\ = h\ now satisfies h\ £ T-L\. 
Hence the decoder can decode message W\ at the end of block 2. Continuing this argument, for the decoding of 
message w^, we assume that Wq, . . . , n/fc-i have already been decoded and therefore the associated sequence /ig +1 
with hj = 1 for j = 1, 2, . . . , k — 1 is contained in Tik- Therefore is also decoded. Thus we see that the entire 
sequence of messages can be decoded. 

However for sufficiently large N, this leads to a contradiction. Notice that when s H%, 



1 N 

-/(X^Y w ) = 5>g(l + |/, fe | 2 p) (11) 

k=Q 

= (N + l)(r -S)\ogp (12) 

whereas the total message rate is Nr log p. Thus if N > ? — 1, we have that the information rate decoded over 
the channel exceeds the instantaneous capacity. This contradicts Fano's inequality. Hence our assumption that 
d(r) > 2(1 — r + 25) cannot be true. Our formal proof that applies for any N T , Nt and T is presented next. 

B. Proof 

We establish that a lower bound on the error probability for any C(R — rlogp, T) code in Definition Q] is 

Pr(e) > p' Td ^ 

where di(r) is the DMT tradeoff associated with a single-link MIMO channel. Define £k = {wk ^ Wk}, the error 
event associated with message Wk, and note that Pr(e) = sup fc>0 Pr(£fc). 

We begin by lower bounding £k associated with message Wk- Recall that this message needs to be decoded 
after T coherence blocks indexed as t E {k, . . . , k + T - 1}. Let H^ +T_1 = H^ +T_1 be the realization of the 
channel matrices in this interval. Applying Fano's Inequality [53 | for message Wk and using the fact that Wq _1 is 
independent of Wk and H? +T_1 we have 



Pr(£ fe ,H fc -H fe )>1____ ___ . 



Since the second term vanishes as the coherence period M — > oo, we ignore it in our analysis. To bound the 
remaining terms we let 

Hs = {H:logdet(l+^:HHt) < (r-5)logp} (13) 

and use H J to denote the T— fold Cartesian product of the set Hg. Furthermore since the channel gains are sampled 
i.i.d. 

PrCH^" 1 e Hj) = (P 5 f (14) 
where Pg = Pr(H G Kg). From the single link DMT in Theorem [TJ 

Pg =Pr(H eHs) = p- d ^ r - s \ (15) 
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where di(-) is the associated DMT function in Theorem [TJ The average error probabilty 
Pr(£fc) = E h t [Pr(£ fc ; H^ +T_1 )] , can be lower bounded as follows. 

Pr(£ fc ) > Pr(H^ 6 7#) • fl - I ^t ¥T -^t\»t ¥T - 1 ^J) \ (16) 



Mr log p 

t i , ^(w fe , Y fe |w ,H fc e « 5 j 

T 





Mr log p 








Mr log p / 







(Ps) 



> (p s f i - J ^ fc "° ':;v"° ^ ,Ls ' (is) 

V Mr log p I 

where (|T7j follows from the fact that the channel gains (Hp -1 , H^^ 1 ) are independent of (n/g , Y^ +T_1 , Hj! +r-1 ). 
Now we combine the error events. 

^ N—T—l 

max Pr(£ fc ) > — — V Pr(£ fe ) 
o<fe<w-r-i N-T ' 

fc=0 

/ v-iJV-T-1 w . V JV_1, l|JV-1 c /l/M\ 



(N -T)Mr log p 



P T | 1 - -i ' 5 ' 



(N-T)Mr log p 



(AT - T)Mr log p 



where ( fT9] l follows from the data processing inequality since i/i/q t 1 — J> 1 — )■ 1 holds. Finally since the 
fading across different blocks is indepenent, 

max Pr(£fc) 

0<k<N-T-l 



^^- (jv-dm^ J <21) 

where (|2TT > follows by substituting in (flj). 

For any 6 > 0, by selecting JV > T5 the term inside the brackets is strictly positive. Since S > is arbitrary, it 
follows that the diversity order greater than Tdi (r) cannot be achieved. 

V. Coding Theorem: Interleaving Scheme 
We now present an interleaving based scheme that achieves the DMT stated in Theorem [2] Our codebook C 
maps each message Wk G Xm = {1)2,..., 2 Mrlogp } to T codewords Xo(wfe), X\(wk), . . . , Xr-i(wfe) of length 
f- i.e., C = Co x Ci . . . x Ct-i and X fc € C fe . 



NM(r - <5)logp 
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Fig. 5. Interleaving based coding scheme for T = 6. Each coherence block is divided into T sub-intervals and each sub-interval is dedicated 
to transmission of one message. The transmission of message spans coherence blocks fc,fe + l,...,fe + T— 1 using codewords of 
Xrj(wfc), ■ ■ • , Xt_i(wj.) as shown by the shaded blocks. 



For transmission of these messages, we assume that each coherence block of length M is further divided into T 
sub-blocks of length ^ , as indicated in Fig. [5] Let Xk,o, ■ ■ ■ ,It.r-i denote these intervals. The codeword Xo(wfc) 
is transmitted in the first sub-block Ik,o of coherence block k. The codeword Xi(wfc) is transmitted in the sub- 
block Ifc+i.i of coherence block k + 1 and likewise Xj(wh) is transmitted in the j-th sub-block of coherence block 
{k+j}. 

The corresponding output sequences associated with message Wk are denoted as 

Y kj = H, . , Xj (i^) + Z fc j, j T I. (23) 

The decoder finds a message Wf. such that for each j S {0, . . . , T — 1}, (Xj(wfc), Y^,-) are jointly typical. The 
outage event at the decoder is given by: 

[t E Q(p)<rlogpj (24) 

j—k 

where Cj (p) = log det (j + Hj Hj^ . Since d56i > precisely corresponds to the outage event of a quasi-static parallel 
MIMO fading channel, with T channels and a multiplexing gain of T ■ r, the DMT follows form Corollary Q] 

VI. Coding Scheme: Sequential Tree Codes 

We propose a second construction which is inspired by the sequential tree codes proposed in 0391 — 0451 . This 
approach has the advantage that the encoder does not need to be revealed the delay. The delay constraint only needs 
to be revealed to the decoder and yet the optimal DMT is attained. 

Our proposed streaming code, C(R,T), consists of a semi-infinite sequence of codebooks {Cq,Ci, . . . ,Ck, ■ ■ ■}, 
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Decoding of w., Decoding of w 2 
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Fig. 6. The left hand figure illustrates illustrates construction of our proposed codebook. The message wo is mapped to one of 2 nR codewords 
in the first level, the message pair (wq, wi) is mapped to one of 2 2nR codewords in the second level of the tree etc., While decoding w k the 
decoder starts at the root of the tree. It first finds all possible transmit paths of depth k + T in the tree typical with the received sequence. If a 
unique prefix codeword in level 1 is determined then the corresponding message wq is decoded. At this point the decoder moves along the path 
of wo and finds all possible codewords from level 2 to k + T that are typical with the received codeword. A unique message wi is determined 
if there is a unique prefix codeword in level 2. This process continues till level k is reached and is determined. 



where Ck is the codebook to be used in coherence block k when messages (wo, ■ ■ ■ , Wk) are revealecQ Codebook 
Ck consists of a total of 2 Afi H fe + 1 ) codewords and each codeword is assigned to one element in the set 

= { ( W 0, ■ • • , Wk) ■■ Wq £ 1m, ■ ■ ■ , W k £ 1m} ■ (25) 

where 1m — {1, 2, . . . , 2 MR }. All codewords are length M sequences whose symbols are sampled i.i.d. from 
CM (ft, jvjY In coherence block k, the encoder observes wq, . . . , Wk, maps it to the codeword Xfe(wo) G c^txAf 
in Ck, and transmits each of the M columns of Xfe over M channel uses. The entire transmitted sequence up to 
and including block k is denoted by 

x V fc ) ^ {XoK^XiK 1 ), . . .,x fc K fe )} , x fc K fc ) g c^*( fe+1 ) M (26) 

For decoding message Wk, our proposed decoder does not rely on previously decoded messages, but instead 



4 We will make the practically relevant assumption that the communication terminates after a sufficiently large but fixed number of coherence 
blocks. 
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computes a new estimate of the all the messages Wq at time 

T k = k + T-l (27) 

using the entire received sequence Y^ fc = (Yo, . . . ,Yr k )- First it searches for a message wq by searching over 
all message sequences w/g k such that {X^ fc {w^ k ), Y^ fc )} are jointly typical. If each such sequence has a unique 
prefix w then w is selected as the message in block 0. Otherwise an error is declared The decoder then proceeds 
sequentially, producing estimates of w%, . . . , w/.. In determining W[, I < k, the decoder uses the already-determined 
vector of estimates Wq. The decoder searches for a sequence of messages wf k such that the corresponding 
transmit sequence X^ fc (ivQ _1 , w ; Tfc ) has the property that the sub-sequence between I to T k (the suffix) satisfies 

(\J k (w l -\w^),y^)e%T k , (28) 
where the set Ti.v is the set of all jointly typical sequences ll53l . 



%v = {(xf,Yf) : Xf G T(p xr ),Yf G T(p Yf ), 



ELi[ l0 gPX fc ,Y fc (Xfe, Y fc ) - h( PXk y k )} 



< e I (29) 



M(l'-l + l) 

where T{pyi>) and T(p y i>) denotes the set of typical {X\ } and {Y\ } sequences respectively and where h(p\ k y k ) 
denotes the differential entropy of jointly Gaussian random variables. 

If the list of all message sequences w[ k that satisfy d28l i have a unique prefix wi then we concatenate wi with 
w^ 1 to get Wq, otherwise an error is declared. When the process continue to step k+ 1 without declaring an error, 
message Wk is declared to be the output message. 

Remark 1: Our decoder is a decision directed decoder. In estimating Wq, it first estimates Wq based on Y^ fe . It 
next makes a conditional estimate of W\ based on Yj k with wq fixed, and continues along in k + 1 steps. One may 
be tempted to try a simpler decoding scheme that avoids the k + 1 steps and directly search for a unique prefix Wq 
such that the resulting transmit sequence X^ fc is jointly typical with the received sequence Y^ fc i.e., 



-Er= [l°gPx fc ,Y,(X fc , Y fe ) - h(px h y„)] 



< e } (30) 



M{k + T) 

Such an approach will not guarantee the recovery of true vi/fc. This is because for k 1 the contribution of the 
terms before w^ will dominate. Even when {wk ^ w k } but m/q -1 = w o~ 1 ^ me P^ T (y( Tk ,Y Tk ) will in general 
satisfy (l30l > as the contribution of the suffix associated with w k will be negligible. 

Our proposed decision directed decoder guarantees that when decoding w k we do not include the bias introduced 
by Wq -1 in 



A. Analysis of error probability 

We show that for any 8 > and < r < min(Ar r , N t ), the error probability averaged over the ensemble of 
codebooks C(R = (r — S)\og 2 p, T), satisfies Pr(£) < p~ T ' d ^ r \ By symmetry we will assume, without loss of 
generality, that a particular message sequence Wq = Wq is transmitted. 
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For the analysis of error probability we define the events 

Si = I w l Q : (wq,...,wi-i) = (w ,...,wi-i), wi ^wi \, 0<l<k (31) 



Bi 



and note that 

fc 

Pr{w k ^w k }<J2^i): (32) 

1=0 

where £i corresponds to the event that our proposed decoder fails in step / of the decoding process. We develop 
an upper bond on £[ for each < I < k and substitute these bounds in ( |32t . 
We further express £i = Ai U Bi , where 

^ = {(X^K Tk ),Y^)^7I, Tfc } (33) 

denotes the event that a decoding failure happens because the transmitted sub-sequence starting from position / 
fails to be typical with the received sequence whereas 

1 3 ■ wt 1 = w l - \m^w h (xj* {w^ ) , Yf *) e % Tk } (34) 

denotes the event that the decoding failure happens because a transmit sequence corresponding to a message sequence 
with wi ^ wi appears typical with the received sequence. 

As shown in the Appendix [B] using an appropriate Chernoff bound we can express, 

PrC4j) < 2- M <n- 1 +W (35) 

where /(e) is a function that satisfies /(e) > for each e > 0. 

To bound Pr(£>/) we begin by noting that by our code construction, we are guaranteed that whenever Wi ^ Wi, 
the associated transmit subsequence Xf k (w Tk ) is sampled independently from Y ; Tfc . Hence from the joint typicality 
analysis ll53l . we have that for any sequence wjf k with wi ^ Wi 

Pr((Xf*K Tfc ),Yr fc )e7I,T fc | H^) 

< 2 --W(Eji ! -f(X ;( (l);Y ;f (l))-3 S ) 

where 

Cj ( P ) = log det (i + ^HjH}) (36) 

is the associated mutual information between the input and output in the j-th coherence block when the channel 
matrix equals Hj = Hj, Applying the union bound we have that 

Pr(S, | Uf k ) < 2- M ^ C 3 [ P )~(T k - l+ l)B-Ze) (3?) 

To bound Pr(£>;) we define 

Oi = J (H,, . . . , H Tfc ) : ^ dip) < (fc + T - l)r log p + (fc - I) A(r) log p + 4e log p j (38) 
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where 

A W = WH - 0<r<nnn(AW 09) 

Note that 

Pr(B,) < Pr(£, | Hf k e Of) + Pr(H^ e 0,) (40) 

From d38l and d37b we have 

Pr(B ; | Hf fc e Of) < 2- A/ ( fc - z ) A ( r ) lo sP- J1/el °gP (41) 

^ -M £ -M(fe-i)A(r)_ (42) 

Thus it remains to bound Pr(C;) in (l4Qb . Note that Of is precisely corresponds to the parallel MIMO channel in 
Corollary [TJ with L — Tk — I + 1 and the multiplexing gain of s = Lr + (k — l)A(r) + Ae. The associated diversity 
order is given by: 

L - d ^l) =L ( N *-l)( N *-l) (43) 

> L(N t - r)(N t -r)-(k- l)(N t + N, - 2r)A(r) - o e (l) (44) 

= T(N t - r)(N t - r) + ^-^(M - r)(JV r - r) - o e (l) (45) 

= Td 1 (r) + ^di(r)-o B (l) (46) 

where we substituted ( f39l > for A(r) in (|43T l and let o e (l) be a function of s that vanishes as s — » 0. 
Thus we have 

From (l40b and substituting (l42l i and (l47l i and using £; = .A; U Bi we have 

Pr(fi)<Pr(>^)+Pr(B,) (48) 

< 2 -A/(T fc -i+l)/(e) + p -Me-M{k-l)Hr) + p -Td t (r)- &£±di (r)+o e (1) _ (49) 

From the union bound, 

Pr(£)<£pr(£,) (50) 

1=0 

< ^2 2- M{Tk ~ i+i)f{£) + p- Me - M ^- i ) A ^ + ^2 p - Td ^- { - h ^ ld ^ r )+ o ^ 1 ) (5i) 

i=o i=o i=o 

Substituting for = k + T — 1, we can express the first term in (IBTt as 

k k 

2 -M(k+T-l)f(e) = 2 -M(l+T)f(e) (52) 

1=0 1=0 

oo 

< Y 2" M((+T)/(e) = 2- MTf ^ +1 , (53) 
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Fig. 7. Streaming setup with two messages arriving in each coherence block. The first message, \ arrives AM symbols from the start of 
the coherence block while the second message 2 arrives M/2 symbols later. We assume a decoding delay of M symbols for each message, 
the left figure shows the case when A = 0, while the right figure shows the case when A = i. 



which vanishes as M — > 00. By a similar argument we can simplify the second and third terms in d5TT > to get 

Pr(£) < o M (l) + ^ WiW+0e(1) (54) 
where ojvf(l) — > as M — » 00. Since e > is arbitrary we have established that the DMT of Td\{r) is achievable. 



VII. Multiple Messages per Coherence Block 

Our primary focus in this paper has been the case when there is one message per coherence block i.e., we 
assume that the message wj. arrives at the beginning of coherence block k, and needs to be reconstructed after 
a delay of T ■ M symbols. In this section we will consider the case when two messages, say i/i/^i and Wk,2 
arrive in each coherence block. We assume that w/^i arrives at time tk,i = kM + AM and w^p arrives at time 
tk.2 = kM + (A + i) M where A S [0, 1/2]. We will assume that each message Wk : i is uniformly distributed in 
the set X M = {1,2,..., 2 MR > 2 } and each message has a decoding delay of T ■ M symbols. We will restrict our 
discussion to the case when T = 1. 

In general when multiple messages arrive in each coherence block there exists an asymmetry in channel conditions 
experienced by these messages. For example in Fig. [7] when A = the message Wk.i only span one coherence 
block whereas the message Wk.2 span across two coherence blocks and thus see two independent fading gains. 
Therefore the simple interleaving technique which was optimal in the case of a single message may not be optimal 
when there are multiple messages in each block. The following result shows that this is indeed the case. 

Proposition 1: The optimal DMT of the SISO streaming setup with two messages per coherence block, and with 
A = and T = 1, is 



d(r) = min ( 



mm ( 1 - -,2-2r) , r £ [0,1]. 



(55) 



Proof: The upper bound is based on the following observation. The bound d(r) = l — r/2 follows by revealing 
every message Wj, 2 to the destination. The bound d(r) — 2 — 2r follows by revealing message Wf. 2 at the start 
of coherence block k and relaxing the deadline of Wk and 1/1/%+ 1 such that both only need to be recovered at the 
end of the coherence block k + 1. From Theorem [2] the associated DMT of this setup is d(r) —2~2r. The upper 
bound follows. 
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The achievability is as follows. We split each message w k .i into two equal sized messages (w£ 1 , w 2 2 ) of rate 
Ro = -R/4. We do not split the messages w k . 2 - We sample three Gaussian codebooks as follows. 

• The codebook Ca consisting of 2 3AIRo codewords xj^ 2 sampled i.i.d. from CAf(0, p). Each pair (w^ l7 Wk-1,2) 
is mapped to a unique codeword xa( w I x , w/c-1.2) 

• The codebook Cb consisting of 2 MRo codewords x^ 2 sampled i.i.d. from CM(0,p). Each message is 
mapped to a unique codeword xb(w^ 1 ). 

« The codebook Cc consisting of 2 2MR " codewords x^ 2 sampled i.i.d. from CAf(0, p 1 ^^). Each message w k2 
is mapped to a unique codeword Xc{wk, 2 ). We will select f3 = r/2. 

In coherence block k, the transmitter transmits x k .i = *a{wI 1, Wk-1,2) in the first half of the coherence block and 
Xk,ii = xs(i/i/| x ) + xc(wk, 2 ) in the second half of the coherence block. The receiver observes y k ,i = h k x k ^ + z k: i 
for i € {/, //}. The decoding of the messages is as follows. 

• The receiver decodes w 2 1 from yk,ii treating xc as additional noise. An outage happens if 

{M i+ ttI^H'°-} 

setting \hk\ 2 = p^ 1- "* 1 and /? = r/2, we can show that ( f56l l is equivalent to 

M 1 + IT^H log 4 (57) 

from which it can be shown that d(r) = 1 — £, 

• After decoding w 2 1 the decoder subtracts Xs(w^ 1 ) from y^.// i.e., y^,// = yfc.jj — h k xs- The decoder 
searches for a pair (wk l2 , w^ +1 1 ) such that (xc(w k:2 ), yfc,//) are jointly typical and (x J 4(w^ +1 1 , Wfe,2)) are 
jointly typical. An outage happens if 

{- log (1 + p 1 -^! 2 ) + ~ log(l + p\h k+1 \ 2 ) < ^rlogp} (58) 

Setting \h k \ 2 = p-f 1 -"!) and \h k+1 \ 2 = an d p = r /2 in 

log(l+p ai - r / 2 )+log(l + p^) < ylogp (59) 

The associated DMT is given by 

d 2 (r) = min (1 - ai)+ + (1 - a 2 )+ (60) 

(ai ,a 2 )£A 

where A= {(ai,a 2 ) > : (a x - r/2)+ + a 2 < 3r/2}. 
It can be deduced that da (?") = 2 — 2r. Thus the DMT associated with the decoding of (w ky2 , w k +i 1) equals 
min(l — r/2, 2 — 2r). Since this analysis can be applied for each k (IB3T l follows.. ■ 
We note that if A = 1/2, and the decoding delay equals T = M symbols, then message w\ spans across two 
coherence blocks whereas w 2 only spans one coherence block. By reversing the role of w k ,i and w k , 2 in the coding 
scheme in Prop. [TJwe can still achieve the DMT in d55l >. However the following result shows that we cannot have 
a universal coding scheme oblivious of A that achieves the same DMT. 
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Proposition 2: Consider the SISO channel model with two messages in each coherence block as in Prop. Q] 
Assume that either A = 0orA = l/2, but the actual value of A is known only to the receiver. The DMT for this 
setup equals d(r) = 1 — r. 

Proof: The achievability is straightforward. Each message w k ,j is mapped to a codeword of length M/2 of a 
Gaussian codebook and transmitted immediately. Since each message is of rate £ log p the DMT of d(r) = 1 — r 
is achievable. 

For the converse, we consider a multicast setup with two receivers. In coherence block k the transmitter transmits 
Xkj in the first half of the coherence block and transmits Xk.n in the second half i.e., = [x k j Xk,n]> where 
both x.k,i,Xk.n G C M / 2 . Receiver 1 observes yk = [yk.i ykji] in coherence block k as follows: 

Vk,i = hkXkj + "k,i,i, (61) 
Ykji = h k x k ,ii + tifc,//,i (62) 

where receiver 2 observes \ik = [\ik,i Vfc./j] in coherence block k as follows: 

Wfe,/ = h k x k j + n k , It2 , (63) 
Vk,n = hk+iXkji + n k ,n,2 (64) 

where the noise variables n k ,j,i have i.i.d. CA/"(0, 1) entries. For both receivers, message Wk.i must be decoded 
at the end of the coherence block k and message Wk .2 must be decoded in the middle of coherence block k + 1. 
Note that the duration of Wk,i spans only one fading state hk, for receiver 1 while Wk,2 spans only one fading state 
hk+i, for receiver 2. By construction any feasible coding scheme for the original channel where the transmitter is 
oblivious of A must be simultaneously feasible for the two receivers on the multicast channel. We show that under 
this constraint d(r) = 1 — r is the maximum possible DMT. 

We begin by considering Fano's inequality for receiver 1 for message Wo.i and rate Mf- logp: 

v> if a mm 2 2J(w , 1 ;y 1 |/7 = h ) 

Pr(£ ,i; h Q = h )>l~ ——. ■ — (65) 

M r log p Mr log p 

Ignoring the second term, which goes to zero as M — > 00 and using the same sequence of steps leading to ( TToT ) 
we have with Pg = p _ ( 1_r + <5 ) 

p.<^>ft(i- "'^;*' ) 

(67) 

\ Mrlogp J 

V Mr log p J 

where (l6Tb follows from the fact that is indpendent of (wq, yo, ho). 
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Similarly applying Fano's inequality for receiver 2 for message 1/1/0,2 we have 

pr( } A _ 2/( Wo , 2 ;v OJJ v l jko , 1 ,/ il e^) \ 

\ Mr log p / 

= p L _ 2/(wo, 2 ;vo,jj,vi,/|wb,i,C +1 e < +2 A (?0) 

I Mr log p y 

^ A 2/(^ 2 ;y^, Vo ^| Wo , 1 ,C+ 1 E< +2 ) \ 
I Mr log p J 

Likewise we can show that for each k < N — 1 

Pr(£ k i) > P s 1 V ' - J ° ' 1 ^ — 5 — '- (72) 

V M; " 6 \ Mrlogp j V 7 



pr(f fca ) > pji - 2I ( w <°*rt><\ h o llMl '< 

I Mr log p 



Thus we have that 



max max{Pr(£ fe i,Pr(£ fe 2 ))} 

0<fc<JV-l 



iV-l 
fc=0 

~ 5 I TVMr log p 

/ T(m n -m n \i N \h N+1 i=1/ N + 1 \\ 

" 5 ^ iVMrlogp y 

> 



(73) 



(74) 
(75) 
(76) 



„/, Efc=o H*k,r, yfcj, Vfc,j|/Jfc) + i"(xfc,jj; y fc ,/j, Vk,ii\h k , h k+1 ) \ 
S \ N Mrlogp ) ( ' 

> f ! _ jV + l + (jV + l)(r-<5)logp \ 

\ Nr log p J 

■ 

where the steps leading to (l78l l are similar to (EH and hence not elaborated. For iV sufficiently large the expression 
in the brackets in ( l78T l is positive. This establishes that d(r) > 1 — r + 5 must hold. Since 6 > is arbitrary this 
concludes the converse in Prop. [2] 

We conclude this section with the following remark. When there are multiple messages, at equal intervals arriving 
in each coherence block different messages observe different channel conditions. Prop.Q~]shows that coding schemes 
that exploits this asymmetry across the messages indeed improve the DMT. On the other hand such schemes crucially 
rely on where the messages arrive in each block. If such information is not available the DMT is in general smaller, 
as established in Prop. [2] 



VIII. Conclusions 

We studied the problem of delay constrained streaming over a block fading channel and established that the 
associated diversity multiplexing tradeoff when there is one message arriving in each coherence block. The converse 



17 



is based on an outage-amplification argument and does not follow from simple reductions to known upper bound. 
The DMT can be achieved using an interleaving scheme that reduces the system to a set of parallel independent 
channels. We also show that another coding scheme, that uses a sequential tree code, can achieve the DMT in a 
delay universal fashion. We also discuss some extensions when multiple messages arrive in each coherence block. 

We believe that the fundamental limits of delay-constrained streaming over fading channels are not well understood 
and the techniques developed in this work can be a useful starting point for many other scenarios in wireless 
communications. 



Appendix A 
Proof of CorollaryIaI 

Let us define: 

C i ( /9 )=logdet('l+^H i H^ , l = l,...,L 
which is the maximum mutual information over channel I. Using Theorem [T] we have that 

Pr(C,(p) <rlogp) =p- d ^\. 



LetS(p)=Ef=i^(p) 



Pr(S(p) <slogp) 

= Pr( |J C\{Ci(p) < n log p}) 

\ ri,...,r L i-i J 

Hi n<s,ri>Q 

< J2 P*({Ci(p)<nlog P }t =1 ] 



n,...,rx, 

]Ci ri<s,r,>0 



n pr ( c H< o )^ i °g<°) 



ri,...,r L i = i 

J2i ri<s,r t >0 



E 



p 



-Ef =1 rfi(n) 



n,...,rx, 
£i>-!<.s,ri>0 



(79) 



(80) 



(81) 



(82) 



(83) 



(84) 



where d82l follows from the union bound, (l83l l follows because the random variables in ((79) are i.i.d. random 
variables and (l84l l follows by substituting in (fSOb . 
Applying Vardhan's Lemma ll22l we have that 



where 



Pr(£(p)< S logp) = p- d « 



^2, r!<s,n>0 1=1 



(85) 



(86) 
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Since the objective function on the right hand side of ( |86j > is symmetric and convex in r%,. . .,Tl the minimum 
happens when ri = ri . . . = Tl = s/L, thus yielding 



d(s) = Ldx (|) , 



as required. 



(87) 



Appendix B 
Proof of d35l 

Our proof is based on the Chernoff-Cramer theorem of large deviations stated below. 

Theorem 3: Suppose that xi, . . . ,Xjy are i.i.d. random variables with a rate function / x ( ) defined as 

f x (t) = supjfl • * - \ogE x [cxp(0 • x)] 1, 



and let M n = i E™=i x i- Then there exists a constant N > such that for all n > N we have, 



Pr(M„ > t) < e 



-nf x (t) 



(88) 



(89) 



□ 



Recall that Ai,v is the event that the true codeword is not jointly typical with the received sequence. To upper 
bound the probability we can ignore the marginal typicality constraints and use 



Pr 



Efc=z[ lo gPx fc ,Y fc (X fc , Y fc ) - h(px h y k )} 



> e 



(90) 



M{V -l + l) 

Note that as Y k = H k X k + Zfc, the Hfc are known to the decoder, and the noise sequence {Zfc} is independent, 

px k ,y k (X k ,Y k ) = p Xk (X k ) P y k]Xk (Y k \X k ) (91) 

= p Xk (X k )p Zk (Y k -H k -X k ) (92) 

= Px(X k ) PZk (Z), (93) 

where the last equality holds since the codewords are sampled i.i.d. and the noise is also i.i.d. Thus h(px k ,y k ) = 
h(px) + h(pz)- And so 



^2[l0EPX k ,Yk(*k, Yfc) - h(p Xk y k )} 



k=l 



(94) 



< 



^[logpx(Xfc) +logp z (Zfc) - h(p x ) - h(p z )] 

k=l 

I' I' 

^[logpx(Xfc) - h{px) + ^[logpz(Zfc) - h(p z )\ 



k=l 



k=l 



(95) 
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where the last step follows from the triangular inequality. Substituting d95l ) into d90t and using using the union 
bound we have 



where we define 



Af v = I X 



Pr(-4i,l0 <Pr(^) + Pr(AS) 
ELfiogPx(Xk) - h( Px )] 



HJ 



M(V -l + l) 

ELjiogpz(z fc )-Mpz 



> e 



(96) 



(97) 



(98) 



M(V -l + l) 

Note that is a sequence of M i.i.d. random vectors each sampled from CA/"(0, -^-1) and E[\ogpx(Xk)} — h(px)- 
Similarly, E[logpz(Zk)] — h{pz)- Then using Theorem [3] there exist functions /x(e) and /z(e) such that for 
sufficiently large N = M(V -l + l) 

Pr(A%,) < exp{-M(/' - I + l)fx(e)}, 
Pr(A?,,) < exp{-M(l' - I + l)f z (e)}. 



Furthermore by directly using ([88} we can show that fx{s) > and /y(e) > 0. Setting /(e) = max(/x(e), /z(e)) 
establishes d35l >. 
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